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The multi-index matching problem (MIMP) generalizes the well known matching problem by 
, going from pairs to d-uplets. We use the cavity method from statistical physics to analyze its 

properties when the costs of the d-uplets are random. At low temperatures we find for d > 3 a 
frozen glassy phase with vanishing entropy. We also investigate some properties of small samples 
by enumerating the lowest cost matchings to compare with our theoretical predictions. 
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I. INTRODUCTION 



The statistical properties of random combinatorial optimization problems can be studied from a number of angles, 
with tools depending on the discipline. Recent years have however witnessed a convergence of interests and techniques 
across mathematics, computer science and statistical physics. An archetype example is the matching problem with 
random edge weights, defined as follows: suppose one has M different jobs and M people to perform them, one person 
per job, and let dj be the cost when job i is executed by person j; the 2-index matching problem consists in assigning 
jobs to people in such a way as to minimize the total cost. The statistical properties of the optimal matching when the 
cost Cij are drawn independently from a common distribution were found two decades ago using the replica [l| and the 
cavity Q methods. These two non-rigorous statistical physics approaches have recently been used to tackle a number 
of computationally more difficult problems such as satisfiability or graph coloring, but the 2-index matching problem 
sets apart for belonging to one of the very few problems where such predictions have been rigorously confirmed |3j • 

In this work, we take the statistical physics approach and study the properties of a generalization of the 2-index to 
multi-index matching problems (MIMPs) where the elementary costs are now associated with d-uplets, representing 
for example persons, jobs and machines when d = 3. At variance with the 2-index matching, d-index matching 
problems with d > 3 are NP-hard. We show here that their low lying configurations also have a different, glassy, 
structure whose description requires replica symmetry to be broken. Remarkably, the replica symmetry breaking 
scheme differs from the common picture that has emerged from the study of other optimization problems such as 
the coloring and satisfiability problems p|- In particular, a naive application of the 1-RSB cavity method at zero 
. temperature [(|, which successfully solves these two problems, is here doomed to fail. The reason for this will be 
traced back to the presence of "hard constraints" . By unraveling this specificity, we put forward arguments whose 
relevance goes beyond matching problems; they indicate when a similar scenario can be expected on other constrained 
systems. The particularly simple glassy structure that we find is also of interest from the interdisciplinary point of 
view: in conjunction with the rigorous formalism available for the 2-index case, it places MIMPs in a choice place for 
I ■ working out a most awaited mathematical understanding of replica symmetry breaking. 

The present paper provides an extensive account of our results on the MIMPs, some of which have already been 
mentioned in Q. The paper is organized as follows. We first define precisely multi-index matching problems, and 
briefly review the past approaches from physics, mathematics and computer science that were developed mainly to 
address the 2-index case. Then we start our statistical study by establishing the scaling of the minimal cost as a 
function of the number of variables and by providing a lower bound from an annealed calculation. A large part of the 
paper is then devoted to present our implementation of the cavity method to matching problems, including a detailed 
discussion of its relations with the rigorous formalism proposed by Aldous; we explain why and how replica symmetry 
must be broken when d > 3, in order to account for the presence of a frozen glassy phase. Finally, the last section is 
dedicated to a numerical analysis of small samples that provides support to the proposed scenario. 



II. MULTI-INDEX MATCHINGS 



A. Definitions 



Two classes of MIMPs can be distinguished, d-partite matching problems and simple d-matching problems, whose 
asymptotic properties will be shown to be related. We first start with the d-partite matching problem that corresponds 
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to the version alluded to in the introduction. An instance consists of d sets, A\,. . . , Ad, of M nodes each, and a cost 
c a is associated with every d-uplet a — . . . , id} G Ax X • • • X Ad. Graphically, it is represented by a factor graph 
as shown in Fig. ^ with hyperedges (factor nodes) joining exactly one node from each ensemble. A matching M. is a 
maximal set of disjoint hyperedges, such that each node is associated to one and only one hyperedge of the matching; 
it can be described by introducing an occupation number n a G {0, 1} on each hyperedge a, with the correspondence 

a G M n a = 1- (1) 
The condition for a set of hyperedges to be a matching can then be written 

Vr = 1, . . . , d, Vi r G A r , ^ n a = 1. (2) 

a : i r (Ea 

The d-partite matching problem consists in finding the matching with minimal total cost, 

C$ = min c a n a (3) 

M a 

with the {n a } subject to the constraints J2J). We consider here the random version of the problem, where the costs c a 
are independent identically distributed random variables taken from a distribution p{c), and we are interested in the 
typical value of an optimal matching in the M — ► oo limit. For definiteness, we take for p the uniform distribution in 
[0, 1], but the asymptotic properties of d-matchings depend only on the behavior of p close to c = 0, and are identical 
for all distributions p with p(c) ~ 1 as c — > 0, such as the exponential distribution, p(c) = e~ c . The case p{c) ~ c r , 
r > 0, can be treated along the same lines, but gives different quantitative results. 

A variant of this setup is the simple <i-matching problem, where a unique set of N nodes, with N being a multiple 
of d, is considered and a cost is associated to each d-uplet of nodes (see Fig. [2J. The d-partite case can be seen as a 
particular instance of a simple d-matching problem where the hyperedges joining more than one node of any A4 are 
given an infinite cost. Simple d-matchings problems are formulated as finding 

Vff = min c a n a (4) 

M a 

under the constraints 

V* = 1,...,JV, Y, Ua = 1 - ( 5 ) 

a : i(Ea 

Before presenting our analysis of random matching problems by means of an adaptation of the cavity method for 
finite connectivity statistical physics models, we briefly review past approaches to the subject, with an emphasis on 
open questions that motivated the present study. 



B. Physical approach 

The 2-index matching problem was the first combinatorial optimization problem to be tackled with the replica 
method, an analytical method initially developed in the context of spin glasses |8(. In the paper [lj, Mezard and 
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Parisi analyzed both the simple and bipartite matching problems for cost distributions p with p(c) ~ c r as c — > 0. 
Using replica theory within a replica symmetric Ansatz, they derived the minimal total cost; thus, for the bipartite 
matching with r = 0, they predicted limji/->oo — 7r2 /6; moreover, they obtained the distribution of cost in the 
optimal matching. Support in favor of their prediction has first come from numerical results and from an analytical 
study of the stability of the replica symmetric solution 0, . This last analysis further yields the leading corrections 
of order 1/N for the value of the minimum matching. 

Interestingly, the same results can be reobtained using a variant of the cavity method based on a representation 
of self-avoiding walks using m-component spins |2(. This alternative formulation, avoiding the bold prescriptions of 
replica theory, furthermore suggests that, if the cost of the hyperedges connected to a given node are ordered from 
the lowest to the highest, the probability for the k-th hyperedge to be included in the optimal matching is 2~ k [TTI| . 
as first conjectured from a numerical study [l2j| . 



C. Mathematical approach 



Replica theory, while a powerful tool to obtain analytical formulae, is not a rigorously controlled method, and its 
predictions have only the status of conjectures within the usual mathematical standards. For the 2-index matching 
problem with r — however, the results mentioned above (value of the optimal matching, distribution of costs, and 
probability of inclusion of fc-th hyperedge) have all been confirmed by a rigorous derivation, due to Aldous 3] . His 
contribution also includes the proof an asymptotic essential uniqueness property that mathematically expresses the 
fact that replica symmetry indeed holds for this problem. The weak convergence approach [l3l ] on which the proof 
is built is closely related to the cavity method we will employ, and the relations between the two formalisms will be 
discussed in Sec.| IVDl Confirmation of the £(2) = tt 2 /6 value for the bipartite assignment problem also comes from 
the recent proofs |1 4 Il5| of a more general conjecture formulated by Parisi 0|; this conjecture states that, for the 
bipartite matching with exponential distribution of the costs, p(c) = e~ c , the mean optimal matching for finite M is 

These mathematical contributions are part of a more ambitious program aiming at developing rigorous proofs and 
possibly a rigorous framework of the replica and cavity methods. Interestingly, Talagrand, one of the prominent 
advocate of this program, devotes the last chapter of his book on the subject [ItJ to the 2-index matching problem, 
stressing that, in spite of the major advances mentioned, it stays a particularly challenging issue. Indeed, finite 
temperature properties have so far resisted to mathematical investigations, even in the limit of high temperature, 
that has been successfully addressed in other spin-glass like models [T3|- We shall comment on the peculiarities of 
matchings with respect to other constrained systems in Sec. IV Bl It is our hope that our work not only provides new 
challenging conjectures, but also suggests some hints for solving unanswered preexisting mathematical questions. 



D. Computer science approach 

If analytical studies of random d-matchings by statistical physicists and mathematicians have been restricted up to 
now to the d — 2 case, d-index matching problems with al > 2 have a longer history in the computer science community, 
(i-partite extensions of the bipartite matching problem were introduced in 1968 under the name of multidimensional 
assignment problems [Tsj : they are also referred in the literature as multi-index assignment problems, and, more 
specifically, as multi-index axial assignment problems (to distinguish them from the so-called planar versions [ial20|). 
MIMPs, as we call them (for multi-index matching problems), have a number of practical applications. The most 
commonly cited one is for data association in connection with multi-target tracking |2]|. Besides a major interest for 
real-time air traffic control, such approaches are for instance helpful for tracking elementary particles in high energy 
physics experiments [22^ . 

From the algorithmic complexity point of view, matching problems have also a pioneering role since the 3-index 
matching problem was among the first 21 problems to be proved NP-complete |23| . In contrast, polynomial algorithms 
are known that solve 2-index matching problems [2^|. Note that being based on a worst case analysis, NP-hardness 
is however only a necessary condition for hard typical complexity, which is the issue which interests us here. Due to 
their intrinsic algorithmic difficulty and to the broad range of their applications, generalized assignment problems are 
the subject of numerous studies in the computer science community; we refer to the reviews |19l |20j for additional 
information and references. 
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III. SCALING AND A LOWER BOUND 



The first task in studying random optimization problems is to determine the scaling of the optimal cost with the 
number of variables [25|]. Here, we address this issue for the two variants of MIMPs, the multi-partite and simple 
multi-index matching problems. The scaling is inferred from an heuristic argument, and confirmed by an annealed 
calculation (first moment method) yielding a lower bound. This leads us to a statistical physics formulation that 
encompasses the two versions of MIMPs. 



A. Scaling 

The statistical physics approach of combinatorial optimization problems consists in defining the energy E(Ai) of 
each admissible solution, here a d-matching M, as its total cost, E(M) = ^2 aeM c a , and in determining the minimal 
total cost, identified with the ground-state energy, by looking at the zero temperature properties of the system. For 
(i-matchings, the corresponding Hamiltonian 

H[{n a }} = ^c a n a (6) 

a 

defines a lattice gas model, where the particles are occupying the hyperedges. The constraints J2J or (J5J implement 
a hard-core interaction between the particles: two "neighboring" hyperedges are not allowed to be occupied simulta- 
neously. To have a sensible statistical physics model, the ground state has to be extensive, i.e., proportional to M 
in the e?-partite case, and to N in the simple case. We propose here a heuristic argument to determine how E[C^ ] 

and E[L^' 1 ] scale with M and N respectively, where E[-] represents the average over the different realizations of the 
costs. The central (local) quantity that monitors the scaling behavior is the number of hyperedges to which a given 

node belongs, noted (A = M or N). Indeed, with the costs uniformly distributed in [0,1], the lowest costs 

to which a node can be associated are of order and the optimal matching is expected to scale like A/W^ . 

Thus, for d-partite matchings, W$ = M^ 1 and E[C^ } ] - M 2 - d , while for simple d-matchings, = (^) and 

E[X^r ] ~ (d— l)\N 2 ~ d . We will therefore be interested in computing the (finite) quantities 

C {d) = lim M*-*E[C$], 

M— »OG 

N d ~ 2 ( ? ) 
N*"*oo (d— 1)! 



t [d) = Jim t^E^I 



The factor [d — 1)! in the second definition is meant to reflect the different number of hyperedges to which a given 
node can connect, in the d-partite and simple versions (this difference is absent when d — 2). With this convention 
we will find the equality = dL^ d \ where the remaining d factor merely comes from the fact that the total number 
of nodes is N for simple d-matchings, but is dM for d-partite matchings. 



B. Annealed approximation 



When energies are extensive in the size N of the system, the equilibrium properties of a statistical physics model are 
entirely encoded in the partition function, Zn(/3) — ^2 M e~ l3E ( M \ or, equivalently, in its logarithm, the free energy 
F/v(/3) = — log[Zjv(/3)]//3. The free energy depends on the realization of the elementary costs, but it is expected to be 
a self-averaging quantity, i.e., such that the free-energy density f(0 ) = limjv^oo Fn((3)/N exists and is independent 
of the sample. The self-averaging property is proved for d = 2 |26| . and we assume here that it holds for d > 3 as 
well. The value of the optimal matching is given by the ground state energy, obtained as lim^oo f({3), where the 
free energy is calculated by performing a quenched average of the partition function, E[lnZ], with E[-] referring to the 
average with respect to the realization of the elementary costs. 

A much simpler calculation is the annealed average, lnE[Z]. Due to the concavity of the logarithm, it yields a lower 
bound on the correct quenched free energy, f an (0) = — \nK[Z]/(Nf3) < — E[ln Z]/(N/3) = f(f3). In fact, since the 
entropy s((3) = 2 df3f(f3) is necessarily positive for a system with discrete degrees of freedom, the free energy f(/3) 
must be an increasing function, and a tighter lower bound can be inferred for the ground-state energy , 



lim /(/?)> sup / an 09). 

/3^oo /3>0 



(8) 
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These considerations are made under the hypothesis that the energies, or equivalently the temperature (3, are 
correctly scaled with N , so that lim^—^ f((3) is indeed finite. Reciprocally, requiring the annealed free energy to be 
extensive provides us with the appropriate scaling of (3. For d-index matching problems, we have 



E[Z] = E 



5> 



0T, a c a n a 



(#M)E[e 



-0c a] #{a&M} 



(9) 



where #A4 denotes the total number of possible matchings and #{a € M.} the number of hyperedges contained in a 
given matching. For d-partite matchings, #.M = (M and #{a 6 A4} = M. To enforce the correct scaling of the 
free energy, we anticipate a rescaling in temperature of the form (3 = M a (3, yielding 



lnE[Z] = [d- 1 - a]MlnM - [ln/3 + d - 1]M + o(M). 
An extensive annealed free energy is therefore obtained by taking a = d — 1, in which case 

In /3 + d - 1 



(d-part)0\ _ 



(10) 



(11) 



The scaling 1/(M(3) ~ M 2 d we obtain corresponds to one introduced in Eq. (J7J). For simple d-index matchings, 
#M = Nl/[(N/d)l(dl) N / d } and #{a e M} = N/d. Rescaling the temperature as f3 = N 4 ' 1 ^ we get 

lnE[Z] = -[ln/3 + d- 1 - ln(d - l)!]jV/d + o(JV). (12) 

To make contact with the d-partite case however, we adopt a slightly different scaling, (3 = N d ~ 1 f3/(d — 1)!, so that 

ln/3 + d-l 



(simple) ^ _ 



d:3 



(13) 



This annealed calculation illustrates the correspondence between the d-partite and simple d-matchings stated in the 
previous section. Apart for the trivial factor d, corresponding to the relation N — dM , the equality is obtained by 
normalizing differently [3 and 0, thereby accounting for the difference in the number of hyperedges a given node locally 
sees [extra factor (d— 1)! in Eq. (jj))]. The annealed free energy is a concave function with maximum for /3*j = e 2 ~ d so 
that we get lower bounds > e d ~ 2 and > e d ~ 2 /d. 



C. Statistical physics reformulation 

From now on, we will cease distinguishing between d-partite and simple d-matchings, and consider a unique sta- 
tistical physics model that describes both problems in a common framework. Our approach is indeed based on the 
cavity method [27j for which only the local properties at the level of each node are relevant, and we have seen that 
by making the appropriate scalings of (3, we can match the local properties of both models. The Hamiltonian we 
consider is 

n[{n a }}=Y l ^ri a , (14) 

a 

with the £ a = M d ~ 1 c a uniformly distributed in [0,M d_1 ] for the d-partite case, and £ a = N d ~ 1 c a /(d— 1)! in 
[0, N d ~ 1 /(d — 1)!] for the simple case. The (inverse) temperature, denoted by (3 to simplify, will correspond to (3 for 
the d-partite case and (3 for the simple case. The only remaining difference kept is the factor d between the two free 
energies, accounting for the relation N = dM. Unless explicitly stated, the formulae to be given hold for the simple 
version ; to get the d-partite counterparts, one has consequently to multiply by d the intensive quantities. 



IV. REPLICA SYMMETRIC SOLUTION 



The approach we adopt to treat the d-matching problems is the cavity method recently developed to solve statistical 
physics models defined on finite connectivity graphs |27| . This section explains the formalism of the replica symmetric 
solution for general d. While the correctness of the replica symmetric approach is a mathematical fact when d = 2, 
we show that it leads to some inconsistency when d = 3, requiring replica symmetry to be broken. 
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A. From complete to dilute graphs 

The hypergraph on which an instance of the simple d-matching problem is defined is complete, in the sense that 
every possible hyperedge arises once and is given a random cost. However, the factor nodes with the smallest 
elementary costs are more likely to belong to the optimal matching; for instance the probability that the k-th most 
costly hyperedge originating from a given node will be included in the optimal 2-matching is 2~ k This suggests 
that hyperedges with large costs can be ignored while retaining most of the structure relevant to the determination 
of the optimal matching. Eliminating hyperedges results in a diluted hypergraph, where each node is connected to 
only a restricted number of hyperedges. From this point of view, in spite of being defined on a complete graph, 
random matchings are effectively closer to statistical physics models defined on finite connectivity random graphs. 
In fact, such a feature already transpired from the initial replica treatment |TJ of 2-index matchings where all the 
multioverlaps Q ai ...a p were required, and not only the two replica overlaps Q ai a 2 y like in usual Curie- Weiss mean field 
models of disordered systems [8j. 

To exploit the underlying diluted structure, one possible method is to introduce a cut-off C, suppress all nodes 
with rescaled cost £ a > C, solve the matching problem on the diluted hypergraph, and finally send C — > oo. The 
hypergraph obtained by this procedure is Poissonian: if £1, £2 > • • • are the costs ordered in increasing sequence of the 
hyperedges connected to a given node, the probability for the connectivity to be k is 

p k = Prober <-..<&< C < 6 +1 <-]= (*£ ) (^J (l - - ^e-O, (15) 

with Wf ] giving the number of hyperedges to which a node is connected, as in Sec. IIII Bl Diluting the complete 
graph has a major drawback however: the diluted hypergraph typically does not allow any matching at all, since for 
instance there is always a finite probability e~ c that a given node is isolated. 

To circumvent this problem, we come back to the model on the complete graph and start by weakening the 
constraints, allowing a node not to belong to a matching, at the expense of paying an extra cost. In more physical 
terms, we view a matching as the close-packing limit of a lattice gas model whose particles are subject to hard-core 
interactions: particles can occupy the hyperedges but two hyperedges connected through a node can not both admit 
a particle. We introduce a grand- canonical Hamiltonian 

'H f i[{n a }} = ^£, a n a - dn^n a = J^(£a - d^i)n a . (16) 

a a a 

where d/i is a chemical potential per hyperedge (/z per node). In the limit fi — > 00, the maximum number of hyperedges 
is occupied by a particle and we recover the matching problem. For finite /i however, the constraints reflecting the 
hard-core repulsion are 

v», 

a : i£a 

to be compared with the hard constraints of Eq. (|SJ), recovered only in the /i — ► 00 limit. Each value of /1 defines 
an optimization problem whose minimum energy corresponds to a zero temperature limit (3 — * 00. The solution 
of the matching problem thus appears as the result of a double limit, (3 — > 00 and jj, — ► 00. The point is that a 
diluted structure is now naturally associated with the system at finite //. Indeed since H,^ is minimized by taking 
n a — whenever £ a > d/z, the ground state is unaffected if all hyperedges a with £ a > dfj, are suppressed, yielding the 
Poissonian hypergraph considered above with C = rf/i. This construction allows us to formulate the initial MIMP as 
the limit /1 — > 00 of optimization problems defined on Poissonian graphs with increasing mean connectivity dfi. We 
will give in Sec. II V Pl an alternative construction based on regular graphs. 

B. Cavity method 

The problem at finite fi defined on a Poissonian hypergraph can be studied by means of the cavity method as 
developed for finite connectivity graphs | 27j ; one of the main advantages of this method over the replica method or 
previous versions of the cavity method |2| is that it allows for a practical investigation of replica symmetry breaking 
(RSB). Since we are interested in the ground-state properties, the cavity method directly at zero temperature seems 
particularly well suited |2S| . However, it will turn out to be necessary to get the finite temperature equations as well, 
and we therefore work at finite 0, postponing the discussion of the f3 — > 00 limit to the next section. 
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FIG. 3: Local structure of a Poissonian hypergraph. When 
the node i is removed, it leaves a rooted-tree with root a. 
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FIG. 4: Distribution V{x) of cavity fields for 3-index 
matchings at different temperatures j3 in the replica sym- 
metric approximation. These distributions are obtained 
by population dynamics, using algorithm P described in 
Appendix A, with parameters C = 60, A/iter = 1 and 
N p0 p = 200000. 



In strong analogy with Aldous' framework (see Sec. UVDj) . the RS cavity method associates the diluted hypergraph 
with an infinite tree or, stated differently, a tree with self- consistent boundary conditions. The starting point is 
however finite rooted trees, that is trees with a singularized node i called the root. Consider for instance the part 
of a tree represented in Fig. given a hyperedge a and one of its connected nodes i (relation noted i € a) , we call 
Z( a ^ 1 ) the partition function of the system defined on the rooted-tree with root a resulting from the removal of i. To 
express it in terms of the partition functions Z^ h ^^ where j refers to the nodes, connected to a, but distinct from i 
(noted j € a — i), we decompose Z^ 11 ^^ as Z^ a ^^ = Z^ a ^^ + z[ a ^ l \ where Z^^^ and Z^^ 1 ^ are the conditional 
partition functions where the root a is either constrained to be empty or occupied by a particle. As an intermediate 
stage in the recursion, we also introduce Y^^°^ and Y^^ a \ which are defined similarly to Z^^ 1 ^ and z[ a ^ l \ but 
for rooted-trees whose root is the node j, in absence of the hyperedge a: the index 1 means that j is already matched 
and the index that it is not. With the notations of Fig. we have the relation 

z («-x) = Yl (y ij ^ a) +Y^ a) 

jea-i 

jea-i 





bej-a 

bej-a cej-{a,b} 

These formulae have simple interpretations: for instance, the first line means that when a is empty, the neighboring 
nodes j £ a — i can be equally matched or not with upstream hyperedges, while the second line means that when a 
is occupied, it generates a cost £ a — dfi and requires the nodes j £ a — i to not be matched. From the conditional 
partition functions, we define the cavity fields 



° , V (19) 

Z 

These definitions are made to insure a proper scaling when fi — > 00 and recover quantities used in previous studies for 
d — 2. Note however that for finite /3, it is more natural to introduce ip^^ a '> = exp[f3(x^^ a " > — fi)] interpreted as the 
probability that node j not matched in the absence of a (or cquivalcntly that j is associated to a in a matching); this 
alternative notation will turn out to be particularly convenient when discussing the freezing phenomenon, in Sec. lVBl 
On a given rooted tree, it follows from Eq. I|18(l that the fields attached to the different oriented edges are related by 



8 



the following message-passing rules, 



,(o-»0 



E 3 

j£a-i 



E< 



-/3(«a-« (t " 3) ) 



(20) 



The limit of infinite rooted trees is taken implicitly by considering the stationary distribution V (x) that is assumed to 
result from the repeated iteration of the message passing relations. By definition V(x) is a distribution of cavity fields 
over the different oriented edges that satisfies the following self-consistent equation, called the RS cavity equation, 



k d-l 



V(x^)=E k , £ J f[ J] dx^V(x^)S (x^ - x^[{x^}]j 



(21) 



o=lja=l 

where the function x^ k '^ is defined according to Eq. (|2t)|l as 



a(*.0[ {a; Ci.) } ] = _I l n L-/SM + ^ e -/Jtt-Eti* UB) ) 



(22) 



and the expectation expresses the average over the disorder, which includes both an average over the connectivity 
k and over the rescaled costs £, 



(23) 



The RS cavity equation 1)21(1 can be solved by a population dynamics algorithm, whose principle is presented in 
Appendix A; the resulting distribution V{x) for d = 3 and different (3 is shown in Fig. V{x) contains all the 
information on the equilibrium properties and, in particular, allows one to compute the free-energy density. It can be 
derived from the Bcthc approximation which produces on a given hypergraph the formula 



J2AF (i+aei) (P) - J2( £ a - l)AF( Q )(/3) 



(24) 



where l a is the degree of hyperedge a, which here is l a = d independently of a. The shifts AF^ +ae ^(/3) and AF^- a \(3) 
correspond respectively to the free-energy shift induced by the addition of a node i together with its connected 
hyperedges a £ i, and to the free-energy shift induced by the addition of hyperedge a. They are given by 



-l3AF ii+aei) (fj) _ 



2 "T" I I 



U) 



-/3A_F< a >(/3) 



riaSi 11/ £ a— i (Yq 

z^ ] + z[ a) 



') 



(25) 



where we introduced the analogs of the partitions functions for rooted tree, but on the complete trees: 



z<"> = 



n w 



Z 



(a) 



j£a 



jea 



Y, 



U) _ 



(6-i) 

! 



(26) 



be.; 



E« = 



j:z[ b ^ n ^ 

6ej cej-b 
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Physically, z[ a ^ /{Z^ + z[ a ^) gives the probability for the hyperedge a to be included in the matching. By averaging 
over the realizations of the disorder, since the mean number of hyperedges per nodes is fj,, we get 

f RS (0) = E[AF('+ Q ^(/?)] - (d - l)/xE[AFW(/3)] (27) 

with explicitly 

k d-1 / k 



" J o=lj a =l V a=l / 

E[AF<°>(/9)] = -~E £ / JJda;Cj)p( a .(j))] J1 ^i + e -«6.-S?=i»i)J . 

J A — i 



C. Integral relations 

The fj, — > oo limit can be taken explicitly. The corresponding equations generalize the formulae established by 
Mezard and Parisi in their first treatment of the 2-index matching problem 1] . For general d, they are 



(29) 



Given G(£), the energy e(/3) and entropy s(f3) are 



1 

= -T7 / dlG(f)e- G ®, 
(id J_ c 



s(j3) = / dZ e- e ' - e- ^] - ^— ? / diG(t)e- G ®, 



(30) 



and the free energy is obtained as f((3) — e(/3) — s((3)/(3. The relation between the function G(l) and the order 
parameter V{x) is, up to a change of variable, a Laplace transform, 

r+oo 

e~ G « = / dxV(x)e~ e . (31) 



From the practical point of view of numerically solving the cavity equations, the finite \x cavity equations are however 
easier to handle than these compact formulae. 

D. Zero temperature limit 

In view of an extension of the mathematical approach from 2-index to d-index matchings with d > 2, it is interesting 
to discuss in some details the relations between our equations and those used by Aldous in his rigorous study of the 
2-index matching problem Aldous' formalism is obtained from our RS cavity equations by taking the zero 
temperature, f3 — > oo. When (3 — > oo, Eqs. (|2U[) become 



u 



(a->i) 



j&a-i (32) 
a-tf-HO = m in (£ b - u ( 6 ^i)). 

bej-a 



Taking [i = oo and d = 2 leads to the recursive distributional equation 29] , 



»w =niin(6-a;W) ( 33 ) 
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on which Aldous' work is based [3:]. A difference is however that its costs derive from a Poisson point process 
(the uniform distribution does not make sense when [i — oo). This Poisson process can nonetheless be related to our 
formalism by implementing a variant of the cut-off procedure. Consider selecting at each step of the cavity recursion 
the k parents of smallest costs, k being now fixed. Then the successive k costs are distributed according to a Poisson 
process with rate one. Nonetheless, while the cavity recursion is perfectly well defined, the corresponding system on 
a given hypergraph does not make sense: a hyperedge may belong to the the list of the hyperedges with the k-th 
smallest costs for one of its node but not for an other one. This is why we introduced the version with a cut-off on the 
costs, which constitutes for finite [i a perfectly sensible statistical physics model. From a purely formal point of view 
the version with cut-off on the number of connected clauses works as well, and provides an alternative formulation 
for numerically solving the cavity equations (see Appendix A for the details and Fig. for an illustration). 

The cavity fields at zero temperature have an interpretation in terms of differences in ground-state energies. The 
cavity field x^^ a ' corresponds to the extra cost of a particle on node j with respect to no particle, in the absence 
of hyperedge a, and the cavity bias u^ a ~*^ to the cost of connecting the node i to the hyperedge a. Note that these 
quantities are actually well defined only if fj, is kept finite, otherwise a particle cannot be removed or added without 
destroying the perfect matching, i.e., without leaving the space of admissible configurations. Similarly, the total fields 
on the complete graph are 



jea (34) 

From the interpretation given, it appears that the hyperedges which indeed participate to the optimal matching 
are those which achieve the minima, i.e., the solution is given by 

n a = <L „*, a* = argmin(£ a - u (a->t) ). (35) 

a 

Since this has to hold for all i E a, the question arises whether this prescription effectively defines a matching, i.e., 
whether argmin Q (^ a — u*"^')) = argmin a (£ a — u^ a ~*^') for all i,j £ a. A positive answer is obtained by generalizing 
to d > 2 the inclusion criterion invoked by Aldous when d = 2, which states 

a* = argmin(£ a - u {a ^) = 1 <^=> £„ < u {a ^ i} + x^ a l (36) 

a 

The independence on % is then a consequence of the dentity — £/( a ) . The proof of the inclusion criterion 

itself is straightforward with the present notations: if a* — argmin a (£ a — u( a ~**)), 

£„. - u {a '^ = min(£ b - u {b ^) < min (& - u^) = x^ a "\ (37) 

bG'i b£i — a* 

Reciprocally, if a ^ argmin b (£ b — u < - b ^^), 

£ a - u (a ^ 2) > min(£ b - u^ 1 ')) = min (& - i/ 6 ^) = x^ a \ (38) 

b£i b(zi— a 

As an alternative to the Bethe formula, the value of the optimal matching can be obtained by inferring (£ a *) from 
the distribution of the fields V(x). Thanks to the inclusion criterion, we have 

i i c°° 

4s = - d £a*) = ^ J dC £ PK>b(tf > 

1 ,00 , d Id \ (39) 

= ~ d J <%) II teP&i) * * I X> " H 

where the factor d corresponds to the number of nodes per hyperedge and 9 represents the Heaviside function, 9(x) = 1 
if x > and otherwise. The RS cavity equations at zero temperature can also be written in terms of closed integral 
relations that generalize known equalities for the d = 2 case, 

G(x) = [ fldtj&Me- 8 ™ U + f> I , (40) 

• y >: ' ! j i V 1=1 J 
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FIG. 5: Annealed and replica symmetric free energies for 
the simple 2-index matching problem. While the annealed 
curve has a maximum at 1/2, the RS curve is strictly mono- 
tone. Its (5 — > oo limit corresponds to the exactly known 
value tt 2 /12 ~ 0.82. 



FIG. 6: Replica symmetric free energy /ijs(/3) for the 3- 
index matching problem with different cut-offs C = d\x = 
30, 45, 60 as a function of the inverse temperature j3. This 
curve has been obtained using algorithm P of Appendix A 
with parameters Af p0 p = 20000 and A/iter = 20000; for 
comparison, the annealed free energy is also represented 
with a dashed line. In inset is a zoom of the data with 
C = 60 more clearly displaying the non-physical decrease 
of fns(J3) for /3 > A, ~ 0.41. 



r (d) _ 1 
Lrs ~ 2d 



>: ■•• ••• , i 



-G(xj) 



(41) 



The distribution G(x) is related to the RS distribution V of the cavity fields by 

G{x) = - In / dtV(t), 



and can be obtained from the finite temperature order parameter G(l) = Gp{l) given in Eq. 12911 by 

G(x) = lim G^^-^x). 

(3— *oo 



(42) 



(43) 



Comparing with the predictions of the cavity method, C^ s 
condition that the RS distribution must satisfy 



j[A 6 (*+<»e0] _ (d- l)^E[Ae( a >] , we obtain the consistency 




(44) 



where the average E[-] is here taken with respect to V. This formula is indeed numerically verified with a good precision, 
in agreement with the equivalence between the two approaches. As a corollary, it shows that Mix] < unless d = 2 
where K[x] = (we recall that in this case one has in fact an explicit formula 0, V(x) — l/[4cosh (x/2)]). Finally, we 

note that for d > 2, the RS energy at zero temperature is only dependent on the mean of V(x), C^ s = — E[x]/ (d — 2). 
However as shown in the following, the RS approach yields incorrect predictions when d > 3. 



E. Entropy crisis 



Using the population dynamics algorithm described in Appendix A, we obtain for the RS free energy fasiP) the 
curves displayed in Fig. [S] (d = 2) and|S] [d = 3). For d = 2, the free energy is an increasing function of (3 with limit 
Irs{P — oo) — 7r 2 /12 ~ 0.82 corresponding to the cost of a minimal 2-index matching. The free energy obtained 
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d = 3 



C 0.004 

A RS 




FIG. 7: RS entropy of the 3-index matching problem. The 
data is from population dynamics, using algorithm R pre- 
sented in Appendix A, with K = 50, M pop = 50000 and 
A/iter = 5000. The line is a linear regression. The RS 
entropy is found to vanish at /3 C = 0.412 ± 0.001 
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FIG. 8: Stability analysis of the RS solution for the 3-index 
matching problem at finite temperature. (In fir) /r is plot- 
ted versus r for different temperatures j3 (from algorithm 
P given in Appendix A with C = 36, A/" p0 p = 20000 and 
A/lter = 10 9 ). The RS solution is stable if the slope of 
(In Hr)/r is negative (see text), which is found to be the 
case for (3 < Pi ~ 0.6. 



for d = 3 is qualitatively different, as it displays a maximum at a finite temperature (3 S ~ 0.41 (see FigQ. This 
entropy crisis reflects an inconsistency of the RS approach 30] . If one assumes the RS approximation holds at high 
temperature in some range of temperature (a non-trivial statement), a phase transition must occur at some j3 c < (3 S . 



F. Stability of the replica symmetric Ansatz 

Replica symmetry fails to correctly describe the low temperature properties of many frustrated systems 8]. A 
necessary requirement for its validity is that it be stable. Here we show that when d = 3 the RS solution is unstable 
below a strictly posisive temperature, that is for j3 > Even if the breakdown of the RS hypothesis was already 
inferred above from the negative value of the RS entropy, studying the stability is instructive since the relative 
positions of Pi and (3 S will establish the discontinuous nature of the phase transition. In [j| , Mezard and Parisi used 
the replica method to prove that the RS Ansatz is stable when d = 2 ; their approach is however quite complicated 
(see j 1 it for a recent reexamination of their analysis) , and to tackle the d = 3 case, we adopt a simpler approach based 
on the cavity method |3lj . Physically, it amounts to computing the non- linear susceptibility \i an( i checking that it 
does not diverge [12 . Picking a hyperedge labeled at random, this susceptibility is written 

oo 

X2 = $>on a >2 „ J2[C(d - l)] r E[(n n r ) 2 c ] (45) 

a r=0 

where E[-] denotes the thermal average and E[-] the spatial average over the disorder. Using the fluctuation-dissipation 
relation, the averaged squared correlation function E[(noTi r )c] between two hyperedges separated by distance r can 
be expressed in terms of the cavity fields as 

E[(n n r ) 2 c ] ~ E 

where the average E[-] is performed with respect to the distribution of the disorder (fc, £) and to the distribution V(x) 
of the cavity fields, except for the with i > 1 which are fixed by x^+i^ = x l - k '^(x il , . . . , x i(d l)k ). To determine 
whether the series in Eq. (|45|l converges or not, we compute 

ln^ r = rln[C(<2- 1)] +lnE 

by using cavity fields from the population dynamics, and check whether linv^oo (In /i r )/r < or not. The numerical 
results are limited to small values of r, but as shown in Fig. [H] they are sufficient to conclude unambiguously that 



n 



'8X^{2 



lit ) 



dx. 



(r -> oo), 



(46) 



n 



dxi 



(47) 
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an instability shows up for 3-index matchings at (3i ~ 0.6, thus confirming the incorrectness of the RS Ansatz for 
describing the f3 = oo limit (the same procedure with d = 2 consistently finds no instability). In addition, since the 
instability takes place only after the entropy crisis, fa > f3 s , we conclude from this analysis that the phase transition, 
located at j3 c < f3 s , must be discontinuous as a function of the order parameter. 



V. REPLICA SYMMETRY BREAKING 



The inconsistencies of the RS Ansatz indicate that replica symmetry must be broken in the low temperature phase. 
This feature is present in many other NP-hard combinatorial optimization problems and is commonly overcome by 
adopting a one-step replica symmetry breaking (1RSB), which, in most favorable cases, turns out to be exact. 



General 1RSB Ansatz 



As formulated by Aldous with the essential uniqueness property Q , replica symmetry in matching problems means 
that quasi- solutions, that is low energy configurations (LECs), all share most of their hyperedges. In contrast, replica 
symmetry breaking (RSB) refers to a situation where LECs arise, which, while being close in cost to the optimal 
solution, are far apart in the configurational space (the measure of distances is the overlap between two matchings, 
i.e., the fraction of common hyperedges, see Sec. IV C|) . One-step replica symmetry breaking (1RSB) is a particular 
scheme of RSB where the structure of the set of LECs can be described with only two characteristic distances, do and 
d\ < do- For it to be correct, two LECs taken at random (with the Gibbs probability measure when working at finite 
0) must be typically found either at distance do or d\. In the replica jargon, close by LECs (at the short distance 
di) are said to belong to the same state (or cluster). At the level of 1RSB, it is assumed that the number A/jv(/) of 
states with a given free energy / grows exponentially with N and is characterized by a complexity £(/) defined by 
E(f)=]im N ^ 00 [bxAf N (f)]/N. 

The 1RSB cavity method derives this "entropy of states" by a Legendre transformation method mimicking the 
derivation of entropy from the free energy in canonical statistical mechanics |33| . The object generalizing the free 
energy is the replica potential </>(/?, m); the parameter m is the Lagrange multiplier fixing the free energy of the 
relevant states, in the same way that the temperature /3 selects the energy of equilibrium configurations in the 
canonical ensemble. The replica potential is defined as 



-Ni3m<t>{/3,m) 



(48) 



where the sum is over the states a, and f a denotes the free energy of a system whose configurations are restricted to 
a. To obtain the relevant states for the equilibrium properties, replica theory prescribes to choose the m in [0, 1] that 
maximizes cf>(/3, m) 8|, so that the equilibrium free energy is given by 



/irsb(/?) = max <j){/3,m) 

0<m<l 



(49) 



Calculating <f>(/3,m) requires introducing as order parameter a distribution Q[Q^^ a '] over the oriented edges (j — > a) 
of distributions Q^~* a \x) of the cavity fields, taken over the different states a [2^J . The 1RSB cavity equations for 
the order parameter read 



/k d— 1 
n n vQ Ua) Q[Q Ua) }8 [q (o) - Q (k ' () [{Q Ua) }} 

o=lj =l 

#*'«[{Qfc«>}](z<°>) = i|nn dx^Q^(x^)S (x^-x^({x^})) e -/3™AF<^> 

J a=l 3a = l ' 

where x^ k & is given by Eq. (122H and the reweighting term is 

k 



(50) 



(51) 



a=l 



The latter corresponds to the shift of free energy due to the addition of the new node. Its presence insures that the 
different states described by the Q^~ >a \x) have indeed all the same free energy, in spite of the fact that the addition 
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of a node inevitably introduces a free-energy shift. The distribution Q[Q] determines the replica potential </>(/?, m) 
whose explicit expression is 



<p({3,m) = E[$ (l+aei) (/?,m)] - (d- l)^E[$ (a) (/?,m)] 



with 

E[$( i+aei )(£,m)] 
and 



n n ^Q (ja) Q[Q 0o) ]in 



a=l Ja = l 



O=lj =l 



a ,(ia)) e -m/3AF^-»({ a; <^»}) 



(52) 



(53) 



E[$ (a >03,m)] = -1e£ f Y[VQ {3) Q[Q u) }\n f ]J dx^ (x U) ) (l + e^"^ A 
" J j=i J j=i 



(54) 



The 1RSB equations can in principle be numerically solved via a population dynamics algorithm 27]. However, 
our efforts in this direction failed to yield a sensible order parameter because the fields were found to diverge as fi 
was increased : the reason for this behavior is elucidated below. 



Frozen 1RSB Ansatz 



Although rarely explicitly mentioned, there exists a replica symmetry breaking Ansatz somewhat intermediate 
between the RS and general 1RSB as just described. The frozen 1RSB Ansatz, which will be argued to apply to 
matchings, is a particular realization of the 1RSB scheme where states are made of single configurations (or, more 
generally, of a non-exponential number of configurations) . In such a case, all the information can be extracted from 
the RS quantities, provided they are adequately reinterpreted. Consider for instance the definition given by Eq. i|48|) 
in the special case where states a have no internal entropy, i.e., f a — e Q . We thus have 



-Npmf a 



-Nf3mf RS (Pm) 



(55) 



where the last equality holds because of the very definition of a RS free energy. The replica potential <f> can therefore 
be expressed in term of the RS free energy only, 

m m)=f RS (pm). (56) 

Following the prescriptions of replica theory, the quenched free energy is obtained by maximizing <f>((3, m) over 
m 6 [0, 1]. Being a concave function function, the RS free energy can have at most one maximum. If (3 S denotes the 
location of this maximum (with maybe (3 S — oo, like for 2-index matchings), we obtain that Jib.sb{P) = 0(1, /3) = 
Jrs(P) for (5 < P s and Jirsb{P) — 4>(Ps/ (3, (3) = Jrs(Ps) for > (3 S . In other words, starting from the assumption 
that the content of states is trivial, the frozen Ansatz predicts a complete freezing of the system at the point (3 S where 
the RS entropy becomes zero : for [3 > (3 S , the system is trapped in a single configuration and its free energy stays 
constant when the temperature is further decreased {(3 increased). 

This scenario is already known to apply to a few models of disordered systems, including the random energy model 
(REM) [34|, the directed polymer on disordered trees |3f|, the binary perceptron [3(j and the XOR-SAT problem on 
its core |37| (with a particular case being error-correcting codes of the Gallager type 38]). Our intention is here both 
to add the matchings to this list, and to clarify the conditions under which such a scenario may apply. At this stage, 
we can already state the following necessary conditions (all satisfied by d-index matchings with d > 3): 

(i) the RS entropy must become negative at a finite (3 S ; 

(ii) the RS solution must be stable up to (at least) j3 a ; 

(iii) no discontinuous 1RSB transition must be detected before (3 a . 

In addition to these properties, the consistency of the frozen Ansatz requires the model to have particular kinds 
of constraints, called hard constraints. Elucidating this point requires a more refined description of the relation 
between the frozen 1RSB order parameter and the RS order parameter. First remember that in the RS picture at 



finite temperature /3, one has a spatial distribution Vix^ - " 1 ^) of cavity fields, where following Eq. (|19fl . ijj 



exp[, 



x 



IIS 



fi)] is interpreted as giving the probability under the Boltzmann measure that node j is not matched 
given that the hyperedge a is absent. For a general 1RSB problem, the order parameter is instead Q[Q^^ a ^ {x^^ a ^ )] 
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where ip0^ a ) = exp[/3(x^^ a ) — fi)] is again a thermal probability, but now restricted to a particular state taken from 
the distribution over states Q^^" a > . In this context, a RS system, characterized by a sing le state, has Q^ a ) (?/> ( ^ a) ) = 

S(ip^^ a ^ — ^^g* )■ For a system in a frozen glassy phase instead, the thermal averages inside each state are trivial 
since there is a single frozen configuration, ^w - * ) = or 1 meaning that a particle is present or absent with probability 
one. Therefore, the relation with the RS order parameter has the form 

Q(i— J^W— )) = ^ls a) 8{^^) + (1 - ^ a) )H^ a) - !)• (57) 

Plugging this expression into the general 1RSB cavity equation, it is found that such an Ansatz is consistent only if 
the system satisfies the condition that in the cavity recursion, the variable on a node is completely determined by 
the values of the variables on the neighboring nodes. Such is the case with matchings when fj, = oo where a particle 
is to be assigned to a hyperedge if and only if none of the neighboring edges are occupied. This is however not the 
case in all constraint problems. Consider for instance the 3-coloring problem where each node is assigned one of three 
colors with the constraint that its color must differ from its neighbors : in the case where all the neighbors have the 
same color, the choice is left for the node between the two other colors. When a variable is fixed by the value of its 
neighbors in the cavity recursion, we say that the system has hard constraints ; hard constraints can be shown |33 | to 
indeed be present in the binary perceptron and in the XOR-SAT model on its core, models where the frozen Ansatz 
applies too. Finally, we note that in the presence of hard constraints, the cavity fields ■0 ( J^ a ) take at the 1RSB level 
values and 1 only, which are associated with and — oo. This explains the divergences observed when 

trying to implement the 1RSB population algorithm at zero temperature with \x — > oo. 



C. Distances 



As mentioned in Sec. IV Al a 1RSB glassy system is generally described by two distances, do corresponding to the 
typical distance between two states, and d\ corresponding to the typical distance between two configurations inside 
a common state. In the case of a frozen 1RSB glassy phase, one has however di = and the structure of low-energy 
configurations (LECs) is characterized by only one distance, do. If (n a ) denotes the mean occupancy of a particular 
hyperedge a, with the average (•) taken over the LECs, the probability for a to belong to two different LECs is given 
by (n a ) 2 - Averaging over the different hyperedges, it defines the overlap 

q = n(n a )% (58) 

which is directly related to the typical distance between LECs through do = 1 — q. As argued before, for a system 
in a frozen glassy phase the distribution of energies of the LECs is described by the thermal average at (3 C in the RS 
approximation, so that 

v _ Yf a) 1 

{Ua} - yW + yW _ i + e -A«.-i: le . (59) 

Averaging over the disorder therefore yields 

q = E[(n Q )|J = E$ / JJ dx^V(x^) (l + e H3ott.-E« e .* w ))~ 2 . (60) 

The overlap q((3) is represented for all values of (3 in Fig. 1 101 when d — 3; given the value of (3 C obtained before, we 
get q = q(/3 c ) = 0.321 ± 0.002. 



VI. NUMERICAL ANALYSIS OF FINITE SIZE SYSTEMS 



The theoretical analysis provided concerned the M — > oo limit. How is that limit reached, and in particular is 
the convergence exponentially fast in M or is it algebraic? To answer such questions, we consider in this section the 
properties of d-partite matchings when M is finite; in the absence of other tools, we do this numerically. It should be 
clear that the most challenging questions concern the low temperature phase of our system; because of that, we will 
focus on the optimum matching and low lying excitations. Even though such a numerical approach requires sampling 
the disorder (random instances) and extracting for instance distributions with inevitable statistical uncertainties, it 
will give evidence that our frozen 1RSB Ansatz is correct; it will also provide some statistical properties of finite size 
systems that are of interest on their own. 
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FIG. 9: RS free energies for d = 4 as obtained from the two 
versions of the population dynamics algorithm described in 
Appendix A (here with A/" pop = 10000 and Mter = 1000). 
Note that the approximation based on Poissonian graphs 
(mean connectivities C = 100, 120, 160, 200) approaches 
the solution from below, while the approximation based 
on regular graphs (fixed connectivities K = 30, 40, 50, 60) 
approaches it from above. As expected, the two limits 
C — > oo and K — > oo are found to match. 



FIG. 10: Overlap q(/3) = E[(n a } 2 ] in the 3-index matching 
as given by Eq. JHHJ. In particular q(/3 c ) = 0.321±0.002 de- 
scribes the typical overlap between two low-energy match- 
ings, that is the fraction of hyperedges generically share. 



A. The branch and bound procedure 

When M is very small, it is possible to enumerate all [M!] rf 1 d-partite matchings of a given sample. Not surprisingly, 
this becomes unwieldy even when M reaches 10, forcing us to choose an alternate approach. Since it is the low energy 
matchings that are of greatest interest, we have developed a branch and bound algorithm that computes the p lowest 
energy matchings, for any given p. Some technical aspects of the algorithm are presented in Appendix B, but the 
essential elements are as follows. 

We represent a matching via a list of M hyperedges, one for each of the M sites of the first set (recall that there 
are d sets, each of M sites). Such a representation includes also some non-legal matchings as some of the sites in 
the second or higher sets could belong to more than one hyperedge; if a matching is not legal, it is discarded. This 
representation can be mapped onto a rooted tree: each level of the tree is associated with one of the sites of the first 
set, while a segment (branch) emerging from a node corresponds to a choice of hyperedge that contains the site of 
that node's level. The root node is associated with the first site, the nodes of the next level are associated with the 
second site, etc... This tree is regular, each node having M d ^ 1 outgoing segments as there are that many hyper-edges 
containing a given site of the first set. Furthermore, it has M + 1 levels: there is one level for each site of the first 
set while the last level consists of leaves rather than of nodes; each leaf corresponds to a candidate matching specified 
by the list of hyperedges obtained when going from the tree's root to that leaf. This list may correspond to a legal 
matching or not, but each matching appears exactly once as a leaf. (In fact, there are M M ^ d ~~ x > leaves while there 
are only [Ml] _1 legal matchings.) 

The principle of the branch and bound algorithm is to find those leaves which satisfy the desired criterion (the 
energy must be less or equal to that of the pth. lowest energy matching) by exploiting a pruning procedure, thereby 
avoiding having to explore all leaves. To begin our pruned search, we produce p distinct legal matchings and put 
them into a list C; the largest energy of the matchings in this list is an upper bound Ejjb on the pth energy level for 
our system. Then we start at the level of the tree's root and consider all of its segments; for each choice of segment, 
the search problem corresponds to finding matchings on a smaller system with one less site in each of the d sets; the 
search can thus be implemented recursively. Suppose we have done k recursions; the sub-problem is associated to 
the node on our tree that is obtained by following the choices of hyperedges in the recursive construction. This node 
corresponds to a partial matching in which the first k sites of the first set have each been assigned a hyperedge. An 
important property is that all hyperedges have positive energies; then we know that any matching that is compatible 
with the current partial matching has an energy greater than it, thereby providing a lower bound on all the leaf 
energies obtainable from the current node. If that lower bound is greater than Ejjb, then the subtree rooted on the 
current node can be pruned (discarded from the search); otherwise, one iterates the recursion (that is one performs 
branching on the different choices of the hyperedge to include at the present level) and k goes to k + 1. When this 
process leads to a leaf that corresponds to a legal matching, we compute the energy E of this matching. If E < Ejjb, 
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FIG. 11: Mean ground-state energy density as a function 
of 1/M at d = 3. The line is the quadratic fit using M > 10 
data. Inset: the rescaled standard deviation of the ground- 
state energy, suggesting a central limit theorem behavior. 
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FIG. 12: Distribution of the extensive ground-state energy 
for increasing M values (from left to right) at d = 3. 



we insert that matching into our list C and remove its worst element so that it always has p elements; we also update 
Eub which by definition is the largest energy of the matchings in £; on the contrary, if E > Eub, we discard the 
matching (leaf). After a finite number of branchings and prunings, the algorithm has explored all choices for the 
segments emerging from the tree's root and one is done. The best p matchings are then in the list C. 

The algorithm without pruning requires 0(M M ^ d ~ 1 ^) operations; with pruning and the different optimizations 
sketched in Appendix B, the number of operations grows roughly by a constant factor when M is increased by 1; in 
particular, for the random instances studied here and d = 3, this factor is about 2.2. 



B. Ground state energies 

We generated a large number of random samples (disorder instances with the hyperedge costs taken to be inde- 
pendent uniformly distributed random variables in [0, 1]) and for each sample determined its ground state. We used 
several random number generators to check that our results were robust. Because of the exponential growth of the 
computation time with M, in practice we were limited to relatively modest values of M . For the results presented 
here and involving only ground states, at d = 3 we used 10000 samples for M = 20 and M = 22, while for the smaller 
values of M we used 20000 samples. We also performed runs at d = 4 but with lower statistics because the algorithm 
becomes less efficient as d increases; in fact, we were limited to M < 14 for that case and had only 5000 samples for 
each M. 

Let's first focus on the behavior of ground-state energy. For each sample, we determine with our Branch & Bound 
algorithm the ground-state energy density eo = E /M = M d ~ 2 cff [cf. Eq. J7J)]; then we can analyse its mean in our 
ensemble or consider other properties of its distribution. 

In Fig. 1111 we show how the mean ground-state energy density E[eo] changes as one increases M. The behavior is 
roughly linear in 1/M, but by eye one can definitely see some curvature. Because of this, linear fits do not give good 
values of x 2 unless the M < 10 data are ignored; for instance, keeping only the M > 10 data, the linear fit gives 
3.040(3) as the limiting value with x 2 = 3-6 for 9 degrees of freedom, while if we use all the data we obtain 3.021(3) 
with x 2 — 32 for 14 degrees of freedom. We have also tried corrections of the type ln(M)/M but this did not work 
well. Thus we proceed by considering quadratic fits. In that case, the resulting M — oo intercept does not depend 
much on whether one uses all or just the highest values of M. In particular, for all the data, we get the limiting value 
3.046(5) with % 2 = 9.6 for 13 degrees of freedom, while using the M > 10 data only one has 3.06(1) with x 2 = 2.3 for 
8 degrees of freedom. (In all these estimates, the error bars quoted are statistical only, as obtained from the statistical 
fluctuations.) We have also considered power fits, namely E[eo] = a + b/M c . Fitting all the data gives the limiting 
value 3.08(1) with x 2 — 7.2 for 13 degrees of freedom while keeping only the M > 10 data leads to 3.09(3) with 
X 2 = 2.3 for 8 degrees of freedom (in both cases, the exponent c is close to 0.88). Since these x 2 are similar to those 
of the quadratic fits, we see that the systematic errors are not negligible and are at least of the same order as the 
statistical errors; because of these effects, the agreement with the theoretical value of 3.126 can be considered rather 
good. 

We studied similarly the case d = 4. The data again has positive curvature when plotted as a function of 1/M, 
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but since we have less statistics and a much smaller range of M, much less precision can be obtained for the large M 
limit. For the linear fit (M > 9) we get a limiting value of 6.75(3) with % 2 = 4.7 for 4 degrees of freedom. For the 
quadratic fit (M > 9 again), we get 7.22(8) with \ 2 = 0.37 for 3 degrees of freedom. Finally, for the power fit we get 
10.2(9) with x 2 = 1-0 for 5 degrees of freedom; the exponent is c = 0.3 which is small and leads to a large upturn for 
M > 100; clearly that regime is far beyond our reach and suggests that the power fit is probably inappropriate as non 
robust (note for instance that the uncertainty on the limiting value is far higher here than for the other fits). The 
different estimates show that uncertainties arising from systematic effects (M too small) are severe; instead of the 
1% precision we had at d = 3, we have a precision of at best 10% at d = 4 (compare to the theoretical prediction of 
7.703). The conclusion is that numerics do not teach us much for the case d = 4 and so hereafter we shall concentrate 
on the different properties arising when d = 3. 

One of the expectations for the d-index matching problem is that the free energy is self-averaging. Although at 
present there is no proof of such a property, there is no reason to expect otherwise; here we are limited by the numerical 
approach to ground states, but in that framework we can determine empirically the distribution of energies in the 
ensemble of random instances. Fig. H*2l displays the probability distribution of the (extensive) ground-state energy Eq 
for several values of M (d = 3). If as expected, the ground-state energy is self-averaging, the relative width of these 
distributions should go to zero. We have thus measured the first few moments of these distributions. In the inset of 
Fig- El we have plotted the standard deviation a of the ground-state energy divided by \f~M as a function of 1/M. 
Self-averaging corresponds to having a /M — > 0; from the inset we see that a/M 1 / 2 goes to a constant at large M so 
self- averaging holds and the convergence of the distribution is compatible with a central limit theorem type behavior; 
such a scaling arises from sums of not too dependent random variables and leads to a Gaussian limiting shape. To 
confirm this, we have looked at higher moments: we find that indeed the skewness and kurtosis of the distributions 
decrease, in line with a central limit theorem type convergence. 

Having a limiting Gaussian distribution for Eq is not a consequence of the frozen 1RSB pattern of replica symmetry 
breaking since in the random energy model the distribution of Eq follows a Gumbel distribution; furthermore, in 
that case the fluctuations in Eq are 0(1) whereas in the matching problem they are O(VM). To see why such large 
fluctuations are "natural" , consider instead of Eq the quantity £q obtained by adding the lengths ti of the shortest 
hyperedges containing each site i of the first set. This quantity arises in a greedy algorithm (but which does not 
necessarily generate a legal matching) and clearly one has Eq < £q. The central limit theorem applies to £q, so it 
will have a standard deviation that grows as \f~M and its distribution will become Gaussian at large M . The actual 
ground-state energy Eq is obtained by allowing hyperedge lengths that are slightly larger than the ii, but this should 
not suppress the large fluctuations nor prevent the central limit theorem scaling. 

C. Other ground state properties 

As discussed at the beginning of this paper, one expects the hyperedge containing a given site in the ground state 
matching to be one of the shortest possible ones. To investigate this issue quantitatively, let us order all the hyperedges 
containing a given site, going from the shortest to the longest hyperedge. The "order" of a hyperedge is then 1 if it 
is the shortest, 2 if it is the next shortest, etc... The orders arising in the ground state should be dominated by the 
lowest ones, 1, 2, 3... Consider thus the frequencies with which these orders arise; in Fig^5|we show the behavior 
of these frequencies for increasing M in the case d — 3. We see that there is a limiting histogram at large M, and 
that indeed the lowest orders dominate. Furthermore, we see that for large k the probability of occupation of an 
edge tends to decrease exponentially with k (the data are displayed on a semi log plot). Note that in the standard 
matching (d — 2) problem, the decrease goes as l/2 fc exactly, while for our d = 3 case, the exponential decay is only 
asymptotic; furthermore, we have found no simple expression giving the decay rate of this exponential. 

D. Excited states 

Let us consider now states above the ground state. Define the excitation energy or "gap" as E\ — Eq where Eq is 
the extensive ground-state energy and E\ that of the next lowest energy state. In Fig. ^] we show that this random 
variable has a limiting distribution so that E\ — Eq = 0(1) in the large M limit, just as happens in the random energy 
model. Furthermore, the distribution is very well fit by an exponential (cf. the curve shown in the figure). 

Following our theoretical conclusions obtained earlier, consider now the overlap between the ground state and the 
first excited state. In our frozen 1RSB picture, these matchings are expected to have a fixed (self-averaging) overlap 
when M grows. In Fig. ^| we show the probability distribution of such overlaps for increasing M. We see that there 
is a local peak at large overlap that shifts toward q — 1 but which simultaneously decays. The bulk of the overlaps 
however arise around q = 0.3 and when M increases we see that the corresponding peak both gets higher and more 
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FIG. 13: Histogram of the occupation probabilities in the 
ground state of the hyperedges as a function of their order 
k [d = 3). (Order is 1 for the lowest value among those 
hyperedges containing a given site, 2 for the next lowest 
value etc...) At large k, these frequencies approach an ex- 
ponential law. 



FIG. 14: Probability density of the gap, E\ — Eo, that is 
the energy difference between the first excited state and 
the ground state (extensive) energies in the case d = 3. 
The curve is a pure exponential to guide the eye. 
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FIG. 15: Probability density of the overlap q between the 
ground state and the first excited state for increasing M 
(d = 3). 



FIG. 16: Density of energy levels, measured from the 
ground-state energy. At low energies (before finite size 
effects dominate), this density grows exponentially as 
exp[/3 c (_E — Eo)] thereby giving the model's critical tem- 
perature. Shown is the case d = 3 for M = 10. 



narrow. Overall, the behavior is compatible with a convergence toward a Dirac peak near q = 0.32, to be compared 
with the theoretical prediction q c = 0.321. 



E. Low energy entropy 

Finally, consider the density of energy levels. In the case of the random energy model, this density becomes self- 
averaging when the excitation energy grows. We have thus computed the disorder averaged density of levels as a 
function of the excitation energy, E — Eq. That is a measure of the exponential of the microcanonical entropy; within 
the frozen 1RSB scenario, it gives the critical temperature via p(E — E ) ~ exp[(3 c (E — Eo)]. In Fig. El we display 
our numerical estimate of p and see that it is very nearly a pure exponential. From the slope on the semi-log plot we 
extract f3 c w 0.405; this value should be compared to the theoretical prediction of 0.412; the agreement is reasonable 
but not perfect. To get better agreement, we believe it would be necessary to go to larger M and also to go further 
in the self-averaging regime, i.e., to consider larger E — Eq which numerically is an arduous task. 
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VII. CONCLUSION 

We presented an analysis of multi-index matching problems (MIMPs) based on an adaptation of the cavity method 
for finite connectivity systems. For the well-known two-index matching problem, our approach provides an alternative 
derivation of results previously obtained using the replica and cavity methods. With respect to these older studies, the 
present one has the advantages of being closer to the mathematical framework developed by Aldous, and of allowing 
replica symmetry breaking effects to be incorporated in a tractable manner. Exploiting this latter possibility, we 
predict the value of the asymptotic minimal cost to be given for d- index matching problems by = £rs(Ps) with 
ens obtained from Eq. and l|5U|> and (5 S satisfying srs(Ps) — 0. Formally, this d > 3 conjecture differs from the 
case d — 2 (where it is a theorem) in that (3 S = oo when d = 2, while (3 S < oo when d > 3. The distinction between 
2-index and d-index matching problems with d > 3 arises clearly from our analytical and numerical analysis: in the 
first case all low cost matchings share most of their hyperedges, while in the second case they differ from each other 
by a finite fraction of their hyperedges. In mathematical terms, the essential uniqueness property does not hold when 
d > 3 or in physical terms replica symmetry must be broken. Extending Aldous' framework to rigorously account 
for this fact and providing a proof of our conjecture for d > 3 seems to us a particularly interesting mathematical 
challenge. 

From a physical perspective, the qualitative difference between 2-index and d-index matchings problems with d > 3 
hinges on the presence at low temperature of a glassy phase. This is similar to the difference that has been found 
between the 2-SAT and 2-coloring problems, which are polynomial, and the if-SAT and g-coloring problems with 
K > 3 and q > 3, which are NP-complctc. For MIMPs, the nature of the glassy phase is however simpler, as it is made 
of isolated configurations instead of separate clusters of many configurations. We termed this phase a "frozen 1RSB 
glassy phase" and attributed it to the nature of the constraints, called hard constraints. As a technical consequence 
of this distinctive feature, a particular frozen 1RSB Ansatz has to be implemented. Such an Ansatz has repetitively 
been used in the literature as a convenient (but rarely justified) substitute for the more complicated general 1RSB 
Ansatz; our discussion on the role of hard constraints provides a clarification of its conditions of validity which we 
believe is of general interest for the investigation of glassy phases in other systems. 
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APPENDIX A: POPULATION DYNAMICS ALGORITHM 

Here we give a short description of the population dynamics algorithm we used to solve the RS cavity equations. 
We implemented two different versions, corresponding to the two different cut-off procedures mentioned in the text, 
associated either with Poissonian (algorithm P) or regular graphs (algorithm R). In addition to the inputs d and /?, 
the algorithm has essentially 3 parameters: the mean degree of the nodes, C (algorithm P) or K (algorithm R), the 
size of the population, N pop , and the number of iterations Niter- The common structure of the two algorithms is the 
following: 

• Initialize with random values a population of cavity fields x[i], i — 1, . . . ,N pop ; 

• Do A/t r ans = 100 times: UpdateQ; 

• Do Niter times: UpdateQ and MeasureQ. 

The first loop allows the system to equilibrate toward the stationary distribution. The subroutine UpdateQ depends 
on the cut-off procedure and can schematically described as follows: 

Do N pop times: 

• Draw k either at random with Poissonian distribution of mean C (algorithm P), or take k = K (algorithm R); 

• Draw costs {£a}a=i,...,fc either independently with the uniform distribution in [0, C] (algorithm P), or according 
to a Poisson process with rate 1 (algorithm R); 
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• Draw at random k(d — 1) members of the population and use them together with the £ a to compute a new field 
x according to Eq. 

• Draw at random one member of the population and replace its cavity field value with xo- 

The subroutine MeasureQ is implemented similarly and computes the free energy according to Eqs. (|27|) - (|28|l . 
The final output for the free energy is obtained by averaging over the A/jtcr iterations, while the fluctuations across 
iterations are used to check convergence. The algorithm must be run for increasing values of C or K to extrapolate the 
C —>■ oo (algorithm P) or K — > oo (algorithm R) limit, requiring one to consider larger and larger population sizes N pop 
to obtain reliable results. Taking this limit is however facilitated by the numerical observation that the Poissonian 
approximation (algorithm P) approaches the solution from below while the regular approximation (algorithm R) 
approaches it from above; this is illustrated in Fig. [§] with d = 4. We refer to the captions of the various figures for 
typical choices of the parameters C, Af pop and A/lter- The numerical results we obtained for d — 2 are consistent with 
the exact solution, (3 C = oo and Jrs(Pc) = tt 2 /12, and are the following for d = 3,4: 

d = 3: p c = 0.412 ±0.001, f RS (Pc) = 1-042 ± 0.0003, 

d = A: /3 C = 0.135 ±0.002, f RS (Pc) = 1-925 ± 0.0006. ( ' 

The free-energy densities are given here for simple matching problems and their counterpart for <i-partite matchings 
are obtained by multiplying the values by d. 

We have also implemented the generalization of this algorithm to solve the 1RSB cavity equations (|50[) and used it 
to check that no discontinuous transition occurs prior to the entropy crisis (see |27| for algorithmic details). 



APPENDIX B: ASPECTS OF THE BRANCH AND BOUND ALGORITHM 



Our objective is to solve e?-partite matching problems at sufficiently large M so that an extrapolation to the M — > oo 
limit can be performed without too much uncertainty. For many problems (satisfiability, coloring, etc.), one prefers 
an easily implementable algorithm such as one in the class of "heuristic" algorithms; in such approaches one performs 
a fast search for the ground state but no guarantee is provided that the global optimum will be found. Examples of 
these algorithms are simulated annealing and variable depth local search. Heuristic algorithms typically attempt to 
move towards regions of lower energy by searching in the neighborhood of a current configuration. However, since 
the search is local, such an approach is bound to break down for problems in which the frozen 1RSB scenario applies. 
This fact pushed us towards the development of an "exact" algorithm capable of delivering a certificate of optimality 
of the proposed ground state. Amongst exact algorithms, enumeration can be discarded because it is much too slow; 
Branch & Bound gets around this problem through pruning of the enumeration/search. There are also other possible 
methods such as Branch & Cut, but these require an in depth understanding of polytopes and rely on separation 
procedures which have not yet been developped for MIMP. Note that in all exact methods, the key to efficiency is to 
have good bounds; fortunately MIMPs are relatively well adapted to such a strategy. 

We already discussed in the main text our choice of representation of matchings and partial matchings. Given a 
partial matching of the first k sites of the first set, we have to solve a MIMP with M — k sites and so the algorithm can 
be implemented recursively. Since at each node we need to consider all of its possible branchings (naively, there are 
(M — fc) d_1 of these), it is useful to order these branchings according to the length of the corresponding hyperedges, 
going from short to long. Rather than recompute these orderings dynamically every time the partial matching changes, 
we do it once and for all at the initialization of the program. This allows for speed but it must be compensated by 
a rapid determination of whether a given hyperedge is allowed; for that we use a data structure which tells us for 
each site of each set whether it is matched (belongs to one of the occupied hyperedges). This structure is updated 
whenever a partial matching is extended or reduced. 

The pruning of the search must be as stringent as possible, and this depends on the quality of the bounds. Our 
simplest bound B\ is just the current partial matching's energy E^: if that energy is higher than Ejjb (the upper 
bound Ejj b as defined in section IVI A|) , then the whole sub-tree below the current node can be pruned. A better 
bound is B2, obtained by adding to Ek the sum over each remaining unmatched site of the first set of the shortest 
hyperedge containing that site. This sum can be precomputed and tabulated. A still better bound is B3 obtained as 
B2 but where now one takes for each site the shortest hyperedge that is compatible with the current partial matching. 
This bound cannot be predefined once and for all and is slow to compute. Since we have found it to be useful for 
pruning, we have optimized its determination by noticing that it can be tabulated and modified incrementally: every 
time the partial matching is extended (a hyperedge is added), we perform the search for the compatible hyperedges 
of each unmatched site starting from the index (order) previously found to be compatible. When backtracking, one 
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has to remove a hyperedge and there we simply go back to the tables we had at that level: in effect, we maintain 
efficiency if we assign tables at each level and follow their updating one step at a time. 

The rate of pruning is very different for the three bounds, and we found that a good strategy (for balancing pruning 
rate and computation time) was to apply the three bounds successively: if the first one does not prune, one goes on 
to the second one and so forth. To speed up the computation further, we found it useful to implement the recursivity 
of the program in a limited mode only: the data structures are set up once and for all at initialization time, the 
hyperedges are ordered once and for all too, and then the recursion is used mainly to go through the branchings and 
to maintain the tables. Efficiency is gained as no reorganization of the instance (hyperedge weights) is performed, 
and in particular no "smaller matching problem" is ever defined explicitly. 
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